75 research outputs found

    A Unified Infrastructure for Monitoring and Tuning the Energy Efficiency of HPC Applications

    Get PDF
    High Performance Computing (HPC) has become an indispensable tool for the scientific community to perform simulations on models whose complexity would exceed the limits of a standard computer. An unfortunate trend concerning HPC systems is that their power consumption under high-demanding workloads increases. To counter this trend, hardware vendors have implemented power saving mechanisms in recent years, which has increased the variability in power demands of single nodes. These capabilities provide an opportunity to increase the energy efficiency of HPC applications. To utilize these hardware power saving mechanisms efficiently, their overhead must be analyzed. Furthermore, applications have to be examined for performance and energy efficiency issues, which can give hints for optimizations. This requires an infrastructure that is able to capture both, performance and power consumption information concurrently. The mechanisms that such an infrastructure would inherently support could further be used to implement a tool that is able to do both, measuring and tuning of energy efficiency. This thesis targets all steps in this process by making the following contributions: First, I provide a broad overview on different related fields. I list common performance measurement tools, power measurement infrastructures, hardware power saving capabilities, and tuning tools. Second, I lay out a model that can be used to define and describe energy efficiency tuning on program region scale. This model includes hardware and software dependent parameters. Hardware parameters include the runtime overhead and delay for switching power saving mechanisms as well as a contemplation of their scopes and the possible influence on application performance. Thus, in a third step, I present methods to evaluate common power saving mechanisms and list findings for different x86 processors. Software parameters include their performance and power consumption characteristics as well as the influence of power-saving mechanisms on these. To capture software parameters, an infrastructure for measuring performance and power consumption is necessary. With minor additions, the same infrastructure can later be used to tune software and hardware parameters. Thus, I lay out the structure for such an infrastructure and describe common components that are required for measuring and tuning. Based on that, I implement adequate interfaces that extend the functionality of contemporary performance measurement tools. Furthermore, I use these interfaces to conflate performance and power measurements and further process the gathered information for tuning. I conclude this work by demonstrating that the infrastructure can be used to manipulate power-saving mechanisms of contemporary x86 processors and increase the energy efficiency of HPC applications

    Q-Learning Inspired Self-Tuning for Energy Efficiency in HPC

    Full text link
    System self-tuning is a crucial task to lower the energy consumption of computers. Traditional approaches decrease the processor frequency in idle or synchronisation periods. However, in High-Performance Computing (HPC) this is not sufficient: if the executed code is load balanced, there are neither idle nor synchronisation phases that can be exploited. Therefore, alternative self-tuning approaches are needed, which allow exploiting different compute characteristics of HPC programs. The novel notion of application regions based on function call stacks, introduced in the Horizon 2020 Project READEX, allows us to define such a self-tuning approach. In this paper, we combine these regions with the Q-Learning typical state-action maps, which save information about available states, possible actions to take, and the expected rewards. By exploiting the existing processor power interface, we are able to provide direct feedback to the learning process. This approach allows us to save up to 15% energy, while only adding a minor runtime overhead.Comment: 4 pages short paper, HPCS 2019, AHPC 2019, READEX, HAEC, Horizon2020, H2020 grant agreement number 671657, DFG, CRC 91

    Detecting Memory-Boundedness with Hardware Performance Counters

    Get PDF
    Modern processors incorporate several performance monitoring units, which can be used to count events that occur within different components of the processor. They provide access to information on hardware resource usage and can therefore be used to detect performance bottlenecks. Thus, many performance measurement tools are able to record them complementary to information about the application behavior. However, the exact meaning of the supported hardware events is often incomprehensible due to the system complexity and partially lacking or even inaccurate documentation. For most events it is also not documented whether a certain rate indicates a saturated resource usage. Therefore, it is usually diffcult to draw conclusions on the performance impact from the observed event rates. In this paper, we evaluate whether hardware performance counters can be used to measure the capacity utilization within the memory hierarchy and estimate the impact of memory accesses on the achieved performance. The presented approach is based on a small selection of micro-benchmarks that constantly stress individual components in the memory subsystem, ranging from caches to main memory. These workloads are used to identify hardware performance counters that provide good estimates for the utilization of individual components in the memory hierarchy. However, since access latencies can be interleaved with computing instructions, a high utilization of the memory hierarchy does not necessarily result in low performance. We therefore also investigate which stall counters provide good estimates for the number of cycles that are actually spent waiting for the memory hierarchy

    Main memory and cache performance of Intel Sandy

    Get PDF
    Abstract Application performance on multicore processors is seldom constrained by the speed of floating point or integer units. Much more often, limitations are caused by the memory subsystem, particularly shared resources such as last level caches or memory controllers. Measuring, predicting and modeling memory performance becomes a steeper challenge with each new processor generation due to the growing complexity and core count. We tackle the important aspect of measuring and understanding undocumented memory performance numbers in order to create valuable insight into microprocessor details. For this, we build upon a set of sophisticated benchmarks that support latency and bandwidth measurements to arbitrary locations in the memory subsystem. These benchmarks are extended to support AVX instructions for bandwidth measurements and to integrate the coherence states (O)wned and (F)orward. We then use these benchmarks to perform an indepth analysis of current ccNUMA multiprocessor systems with Intel (Sandy Bridge-EP) and AMD (Bulldozer) processors. Using our benchmarks we present fundamental memory performance data and illustrate performance-relevant architectural properties of both designs

    Extending the Functionality of Score-P through Plugins: Interfaces and Use Cases

    Get PDF
    Performance measurement and runtime tuning tools are both vital in the HPC software ecosystem and use similar techniques: the analyzed application is interrupted at specific events and information on the current system state is gathered to be either recorded or used for tuning. One of the established performance measurement tools is Score-P. It supports numerous HPC platforms and parallel programming paradigms. To extend Score-P with support for different back-ends, create a common framework for measurement and tuning of HPC applications, and to enable the re-use of common software components such as implemented instrumentation techniques, this paper makes the following contributions: (I) We describe the Score-P metric plugin interface, which enables programmers to augment the event stream with metric data from supplementary data sources that are otherwise not accessible for Score-P. (II) We introduce the flexible Score-P substrate plugin interface that can be used for custom processing of the event stream according to the specific requirements of either measurement, analysis, or runtime tuning tasks. (III) We provide examples for both interfaces that extend Score-P’s functionality for monitoring and tuning purposes

    Validation of Recent Altimeter Missions at Non-Dedicated Tide Gauge Stations in the Southeastern North Sea

    Get PDF
    Consistent calibration and monitoring is a basic prerequisite for providing a reliable time series of global and regional sea-level variations from altimetry. The precisions of sea-level measurements and regional biases for six altimeter missions (Jason-1/2/3, Envisat, Saral, Sentinel-3A) are assessed in this study at 11 GNSS-controlled tide gauge stations in the German Bight (SE North Sea) for the period 2002 to 2019. The gauges are partly located at the open water, and partly at the coast close to mudflats. The altimetry is extracted at virtual stations with distances from 2 to 24 km from the gauges. The processing is optimized for the region and adjusted for the comparison with instantaneous tide gauge readings. An empirical correction is developed to account for mean height gradients and slight differences of the tidal dynamics between the gauge and altimetry, which improves the agreement between the two data sets by 15–75%. The precision of the altimeters depends on the location and mission and ranges from 1.8 to 3.7 cm if the precision of the gauges is 2 cm. The accuracy of the regional mission biases is strongly dependent on the mean sea surface heights near the stations. The most consistent biases are obtained based on the CLS2011 model with mission-dependent accuracies from 1.3 to 3.4 cm. Hence, the GNSS-controlled tide gauges operated operationally by the German Waterway and Shipping Administration (WSV) might complement the calibration and monitoring activities at dedicated CalVal stations

    Population-level variation in senescence suggests an important role for temperature in an endangered mollusc

    Get PDF
    Age-related declines in survival and function (senescence) were thought not to exist in wild populations as organisms, and particularly in invertebrates, do not live long enough. While, recent evidence has demonstrated that senescence is both common and measurable even in wild populations under field conditions, there are still organisms that are thought to exhibit “negligible senescence”. We explore variation in rates and patterns of senescence in the biogerontological model organism Margaritifera margaritifera across five populations, which differ in their age profile. In particular, we tested the theory of negligible senescence using time-at-death records for 1091 specimens of M. margaritifera. There is clear evidence of senescence in all populations, as indicated by an increase in mortality with age, but the nature of the relationship varies subtly between populations. We find strong evidence of a mortality plateau at later ages in some populations but this is unequivocally absent from others. We then demonstrate that the temporal scaling of the rates of senescence between five populations of M. margaritifera can be explained by the variation in the thermal environment of the population. Hence climate change may pose a threat to the demography of this long-lived, endangered species, and a greater understanding of the relationship between river temperature and population structure will be essential to secure the species against global temperature increases. Our findings demonstrate that useful insights can be drawn from a non-invasive monitoring method to derive demographic data, and we suggest a wide-scale application of this method to monitor populations across the whole latitudinal (and, hence, thermal) range of the species
    • …
    corecore